Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prp to handle mykrobe csv format #13

Merged
merged 29 commits into from
Jan 4, 2024

Conversation

ryanjameskennedy
Copy link
Collaborator

No description provided.

@ryanjameskennedy ryanjameskennedy marked this pull request as draft December 27, 2023 16:28
@mhkc mhkc marked this pull request as ready for review January 2, 2024 09:13
@mhkc mhkc self-requested a review January 2, 2024 09:17
Copy link
Collaborator

@mhkc mhkc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Mykrobe parser seems to be parsing resistance variants as resistance genes.

For instance is katG_S315T from the test output file is being parsed as a gene and not a variant.

CHANGELOG.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is also intended to be prepared for release we should also add

## [Unreleased]

### Added

### Fixed

### Changed

@ryanjameskennedy
Copy link
Collaborator Author

The Mykrobe parser seems to be parsing resistance variants as resistance genes.

For instance is katG_S315T from the test output file is being parsed as a gene and not a variant.

katG is the gene the the variant is on? Or should I remove this entirely?

@mhkc
Copy link
Collaborator

mhkc commented Jan 2, 2024

Yes, katG is the gene name where the mutation was found.

Mykrobe report variants with the following format

The variants column contains entries usually in the format _-:::. For example, rpsL_K43R-AAG781686AGG:0:513:3132 means K to R amino acid change at position 43 in the rpsL gene, which is AGG to AAG at position 781686 in the genome, with 0 reads supporting the reference, 513 depth on the alternative allele, and a genotype confidence of 3132.

https://github.com/Mykrobe-tools/mykrobe/wiki/AMR-prediction-output

@ryanjameskennedy
Copy link
Collaborator Author

Yes, katG is the gene name where the mutation was found.

Mykrobe report variants with the following format

The variants column contains entries usually in the format _-:::. For example, rpsL_K43R-AAG781686AGG:0:513:3132 means K to R amino acid change at position 43 in the rpsL gene, which is AGG to AAG at position 781686 in the genome, with 0 reads supporting the reference, 513 depth on the alternative allele, and a genotype confidence of 3132.

https://github.com/Mykrobe-tools/mykrobe/wiki/AMR-prediction-output

No I understand that, just wondering if you mean that I must only report a variant in mykrobe's case?

prp/parse/phenotype/mykrobe.py Outdated Show resolved Hide resolved
prp/parse/phenotype/mykrobe.py Outdated Show resolved Hide resolved
prp/parse/phenotype/mykrobe.py Outdated Show resolved Hide resolved
prp/parse/phenotype/mykrobe.py Outdated Show resolved Hide resolved
prp/parse/typing.py Show resolved Hide resolved
prp/parse/typing.py Outdated Show resolved Hide resolved
prp/parse/typing.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mhkc mhkc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its getting closer.

The parser cant handle when there are multiple variants yielding the same resistance. For instance Rifampicin resistance in the test file,

mykrobe report that it is caused by three deletions,

  • rpoB_TCATGGA1298T-TCATGGA761104T:152:2108:14397
  • rpoB_TTCATGGA1297TT-TTCATGGA761103TT:303:1978:13279
  • rpoB_CTGAGCCAATTCATGGACCAGAACAACCC1288CTGAGCCAATTCCAGAACAACCC-CTGAGCCAATTCATGGACCAGAACAACCC761094CTGAGCCAATTCCAGAACAACCC:1:2505:16677

The current iteration of the parser only reports one variant

{'alt_aa': None,
 'alt_nt': 'T',
 'annotation': None,
 'ass_end_pos': None,
 'ass_start_pos': None,
 'change': 'TCATGGA1298T',
 'close_seq_name': None,
 'contig_id': None,
 'depth': 16677.0,
 'drugs': ['rifampicin'],
 'element_subtype': None,
 'element_type': 'AMR',
 'gene_symbol': 'rpoB',
 'method': 'kmer_count',
 'nucleotide_change': 'c.TCATGGA761104T',
 'phenotypes': [],
 'position': 761104,
 'protein_change': 'p.TCATGGA1298T',
 'ref_aa': None,
 'ref_database': None,
 'ref_id': None,
 'ref_nt': 'TCATGGA',
 'res_class': None,
 'res_subclass': None,
 'sequence_name': None,
 'strand': None,
 'target_length': None,
 'type': None,
 'variant_type': 'deletion'}

prp/parse/phenotype/mykrobe.py Outdated Show resolved Hide resolved
@ryanjameskennedy
Copy link
Collaborator Author

Its getting closer.

The parser cant handle when there are multiple variants yielding the same resistance. For instance Rifampicin resistance in the test file,

mykrobe report that it is caused by three deletions,

  • rpoB_TCATGGA1298T-TCATGGA761104T:152:2108:14397
  • rpoB_TTCATGGA1297TT-TTCATGGA761103TT:303:1978:13279
  • rpoB_CTGAGCCAATTCATGGACCAGAACAACCC1288CTGAGCCAATTCCAGAACAACCC-CTGAGCCAATTCATGGACCAGAACAACCC761094CTGAGCCAATTCCAGAACAACCC:1:2505:16677

The current iteration of the parser only reports one variant

{'alt_aa': None,
 'alt_nt': 'T',
 'annotation': None,
 'ass_end_pos': None,
 'ass_start_pos': None,
 'change': 'TCATGGA1298T',
 'close_seq_name': None,
 'contig_id': None,
 'depth': 16677.0,
 'drugs': ['rifampicin'],
 'element_subtype': None,
 'element_type': 'AMR',
 'gene_symbol': 'rpoB',
 'method': 'kmer_count',
 'nucleotide_change': 'c.TCATGGA761104T',
 'phenotypes': [],
 'position': 761104,
 'protein_change': 'p.TCATGGA1298T',
 'ref_aa': None,
 'ref_database': None,
 'ref_id': None,
 'ref_nt': 'TCATGGA',
 'res_class': None,
 'res_subclass': None,
 'sequence_name': None,
 'strand': None,
 'target_length': None,
 'type': None,
 'variant_type': 'deletion'}

Solved!

Copy link
Collaborator

@mhkc mhkc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Well done!

@ryanjameskennedy ryanjameskennedy merged commit f113731 into master Jan 4, 2024
7 checks passed
@mhkc mhkc deleted the 12-handle-mykrobe-csv-output branch January 4, 2024 13:00
@ryanjameskennedy ryanjameskennedy linked an issue Jan 4, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants